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Bike-sharing systems are a means of smart transportation in urban environments with the beneht of 
a positive impact on urban mobility. In this paper we are interested in studying and modeling the 
behavior of features that permit the end user to access, with her/his web browser, the status of the 
Bike-Sharing system. In particular, we address features able to make a prediction on the system state. 
We propose to use a machine learning approach to analyze usage patterns and learn computational 
models of such features from logs of system usage. 

On the one hand, machine learning methodologies provide a powerful and general means to 
implement a wide choice of predictive features. On the other hand, trained machine learning models 
are provided with a measure of predictive performance that can be used as a metric to assess the cost- 
performance trade-off of the feature. This provides a principled way to assess the rantime behavior 
of different components before putting them into operation. 


1 Introduction 

Product line engineering provides a way to manage variability during the entire design process ll^ and 
is an important means for identifying variability needs early on. In this context, a feature represents a 
‘logical unit of behavior that is specified by a set of functional and quality requirements’ |[20l . A Feature 
Model is a compact representation of the commonalities and variabilities of the system, expressed as 
mandatory and optional features. Variability is achieved through the selection of the features that will be 
present in the final product. 

In attributed feature models, quantitative, non-functional characteristics of features are captured by 
attributes that are assigned to each feature. The use of attributed feature models is specifically useful for 
the decision-making process Il20ll . as each stakeholder can make decisions taking into consideration both 
the features and the characteristics of the final product. A number of techniques allow the configuration 
of feature models based on both functional and non-functional requirements EOl l^rTlI. 

In this paper, we want to explore the possibility of applying a machine learning (ML) approach to 
implement features and, at the same time, evaluate them to derive meaningful values to fill the attributed 
feature model. We concentrate on predictive features that are able to analyse the current state and some 
historical data, and provide some information to the user. More in general, the purpose of the analysis is 
to evaluate the features and their possible combinations to help a stakeholder in deciding which product 
of a line to deploy, making the best possible compromise between cost and usefulness. The stakeholder 
may be a client that wants to buy a product, or, like in this case, a supplier that wants to know, before 
deployment, if a feature prediction accuracy will be worth the cost. Finally, also when the cost is not an 
issue, it is interesting to have an assessment of the features before putting them in operation, to be sure 
they are accurate. 
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We focus, to explain our ideas, on Bike-sharing systems (BSS), which are a sustainable means of 
smart transportation with a positive impact on urban mobility. The quantitative analysis of bike-sharing 
systems, seen as collective adaptive systems (CAS) is a case study of the European project QUANTICOL 
(http://www.quanticol.eu). CAS consist of a large number of spatially distributed entities, which may be 
competing for shared resources even when collaborating to reach common goals. The importance of the 
CAS in the context of urban mobility and in achieving societal goals means that it is necessary to carry 
out comprehensive analysis of their design and to investigate all aspects of their behavior before they are 
put into operation. 

In previous work l[3l|4l|51, this case study was presented and defined a discrete feature model, speci¬ 
fying several kinds of nonfunctional quantitative properties and behavioral characteristics. In particular, 
in f5j we established a chain of tools, each used to model a different aspect of the system, from feature 
modeling to product derivation and from quantitative evaluation of the attributes of products to model 
checking value-passing modal specifications. 

This paper puts forward the idea of using Machine Learning (ML) methodologies to learn compu¬ 
tational models of the features from BSS usage data. Throughout these methodologies, it is possible to 
learn the (unknown) relationship between the feature and its inputs by exploiting historical data repre¬ 
senting examples of such input-output map. A trained model can then be used to provide predictions on 
future values of the feature in response to new input information, i.e. providing an implementation of the 
feature component. The advantage of such an approach is twofold: on the one hand, such methodologies 
provide a powerful and general means to realize a wide choice of predictive features for which there exist 
sufficient (and significative) historical data. On the other hand, trained ML models are provided with a 
measure of predictive performance that can be used as a metric to assess the cost-performance trade-off 
of the feature. This provides a prineipled way to assess the runtime behavior of different components 
before putting then into operation 


2 Case Study 

Many cities are currently adopting fully automated public bike-sharing systems (BSS) as a green urban 
mode of transportation. The concept is simple: a user arrives at a station, takes a bike, uses it for a while 
and returns it to another station. A BSS can be conveniently considered and designed as a product line. 

The automation of these systems permits to monitor the stations, to control if borrowed bikes are 
returned, to let the user pay for the usage, etc. In particular, a basic service of the system keeps track of 
all bikes and maintains a complete picture of which bikes are docked to each station and which ones are 
currently hired. Lor hired bikes, the system keeps track of the user name and departure station and time. 

A user is interested in knowing the status of some station. In the case of the station being empty, the 
system may make a prediction and infer if there is a bike that will be returned to that station soon. 

We ean describe the feature model of the Status subsystem taking into account the above condi¬ 
tion and hence it eomprises a mandatory feature (for the basic service) and two optional ones, that can 
represent two different ways to predict the arrival of a bike, as shown in Ligure[T] 

Leature AllBikesNow, mandatory in each product of the line, keeps updated the current status of the 
service and tells how many bikes are parked in each station. This is used by the Bike-Sharing system 
administrators to know if some stations are empty or full and bikes need to be redistributed. The final 
user can access the status using their web browser before going to the station. 

The LocationPreview feature predicts if a bike is going to arrive at a given station, and estimates the 
needed time. It makes use of a GPS to locate the bike: knowing the departure station and the path so 
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Figure 1: Status subsystem: the Feature Model 


far, the feature can predict the probability it will arrive at a station of interest in the next few minutes, 
and calculate the expected arrival time. Learning models are trained using historical traces of the BSS 
system usage. 

Feature UserProfile also offers the same kind of prediction, but uses different data: the log, for each 
user, of all the uses of the Bike-Sharing system. For each use of the system, the log contains: departure 
time and station; arrival time and station. Analysing these data, UserProfile can predict if one of the 
bikes currently in use will arrive at the station of interest. Again, the feature returns a probability and the 
expected arrival time. 

In the next sections we provide a brief overview of ML and we discuss how this methodologies can 
be exploited to realize the LocationPreview and UserProfile features of our BSS use case. Further, we 
summarize the main ideas underlying performance assessment in ML models. 


3 An Introduction to Machine Learning 

Machine learning provides computational models and methodologies to realize data-driven adaptive ap¬ 
proaches to data analysis, pattern discovery and recognition, as well as to the predictive modeling of 
input-output data relationships. The term data-driven refers to the fact that ML approaches rely on (nu¬ 
merical) information encoded in the data, which is typically vectorial (i.e. multivariate data in a vector 
space) but can also be of relational type (i.e. compound information with a graph-based representation 
where edges encode relationships between the atomic information pieces) ifTTl . 

ML is an active and wide research field comprising several paradigms, e.g. neural-inspired, proba¬ 
bilistic, kernel-based approaches, and addressing a variety of computational learning task types. For the 
purpose of product line feature modeling and evaluation, we focus on ML models and algorithms targeted 
at solving supervised learning tasks. Supervised learning refers to a specific class of ML problems fhaf 
comprise learning of an (unknown) map M \ SI ^'31' befween inpuf information (e.g. a vector of 

affribufes) and an oufpuf predicfion y (in general, a vecfor of differenf dimensionalify wifh respecf 
to fhe inpuf). Such an unknown map is learned from couples ^ = {(xi ,yi),..., {xN,yN)} of inpuf-oufpuf 
dafa, referred fo as training examples, following a numerical roufine fargefed af fhe opfimizafion of an 
error/performance funclion E{^) which measures fhe qualify of fhe predicfions generafed by fhe ML 
model. 

ML models are characferized by fwo operafional phases. The firsf is fhe training (or learning) phase, 
where ground-frufh feaching informafion (encoded in fraining samples) is used to adapf fhe parameters 








78 


Using a Machine Learning Approach to Evaluate Procuct Line Features 


regulating the response of the model so that its error (performanee) E{^) is redueed (inereased, respee- 
tively). The testing (or prediction) phase, instead, supplies a trained model with novel input information 
(typieally unseen at training time) to generate run-time predietions (i.e. to eompute the learned map on 
novel data). The two phases are not always disjoint: incremental learning approaehes exist that allow to 
eontinuously adapt the parameters of a ML model while this keeps providing its predietions in response 
to new input data. 

In general, the final quality of the ML model predietions is inliueneed, on the one hand, by the quality 
of the training data, whieh should represent a suffieient and signifieative sample of the relationship to be 
modeled, and, on the other hand, by the adequaey of the learning model for the speeifie eomputational 
learning task. In this sense, different tasks, assoeiated with different features to be modeled, may require 
to use learning models with different eapabilities: in the following seetion, we analyze the nature of the 
tasks assoeiated with BSS features predietion and we diseuss whieh ML approaehes are best suited to 
address them. 

4 Machine Learning for BSS Features 

Supervised learning approaehes ean be used to address modeling of produet line features in our BSS 
seenario using logs of previous bike usage as training samples for the ML model. Here, we foeus on the 
realization of the UserProfile and LocationPreview features. These two features are paradigmatie of two 
elasses of learning tasks whieh require learning models of different nature and eapabilities, i.e. statie 
models for veetorial data and dynamie models for sequential data. 

4.1 User Profile 

The UserProfile feature requires to prediet the destination station and arrival time of a bike given infor¬ 
mation on its piekup details and having knowledge of the BSS usage of the person who has pieked-up the 
bike. A ML approaeh to realize sueh feature requires to train a different ML model for eaeh user, using 
its personal usage logs as training data. In other words, a training dataset eontains veetor eouples (v„,y„) 
where the input attributes are a numerieal eneoding of the departure time and station, while the output 
jn eneodes the assoeiated time with destination and arrival station. At run-time, the feature predietion 
will be obtained by seleeting the trained learning model assoeiated with the user who has pieked-up the 
bike and supplying it with the details of piekup time and station. Training of the learning models ean 
also be performed at run-time: for instanee, when a new eustomer subseribes to the serviee, the system 
starts eolleeting his/her usage information; as soon as suffieient usage data is eolleeted, it is used to train 
a new learning model speeifie for fhe usage pafferns of fhe eusfomer. The same approaeh ean be used fo 
mainfain fhe knowledge eneoded in an exisfing eusfomer model up-fo-dafe: new examples are added fo 
fhe log as fhe eusfomer uses fhe sysfem and a re-fraining of fhe learning model is performed as soon as a 
suffieienl amounf of new dafa is eolleefed. 

The prediefion of fhe arrival sfafion is an insfanee of a classification problem whose objeefive is fo 
assign an inpuf pattern fo one of K differenl and finite elasses (bike sfafions, in our ease sfudy). The 
prediefion of fhe fime fo destination, on fhe ofher hand, is an example of a regression task, where we 
are required fo predief a generie (possibly eonfinuous valued) oufpuf in response fo fhe inpuf pattern. 
These fwo problems ean be more effeelively solved by resorling fo fwo separafe and speeialized learning 
models. The predietion of a probabilify esfimafe of elass membership in plaee of a hard elass assignmenf 
ean be easily aehieved by using a one-of-.^f eneoding of fhe elassifier oufpuf eoupled wifh a soft-max 
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between the K outputs. The one-of-A' encoding represents the fact that the n-th sample belongs to class 
k (out of K)hy a K dimensional output vector having zeros on all components except for its ^-th 
element yn{k) which is set to one. As a result, a trained learning model provided with an input x at 
testing phase will produce a /T-dimensional prediction y\ the corresponding soft-max output y will again 
be a /T-dimensional vector whose k-th element is 


m 


m 

lf=iK0' 


Data involved in the UserProfile feature is of static type, that is each training sample is a pair of 
identically and independently distributed vectors. The majority of the learning models in the literature 
have been designed to deal with such static vectorial data. For the purpose of implementing the User- 
Profile feature it is worth mentioning Support Vector Machines (SVMs), a family of supervised learning 
models which construct separating hyperplanes between the training vectors and exploit them to perform 
classification and regression lfT9l l. SVMs build on the concept of a linear separator (i.e. the separating 
hyperplane) and extend it to deal with non-linear problem by exploiting the so-called kernel trick, that 
is an implicit map of the input vector into an high-dimensional feature space by means of a non-linear 
map induced by a kernel function. SVM are highly effective classifiers and regressors for a wide-class of 
learning problems and several stable implementations are freely available |[T3l [^l9l. SVM training can be 
computationally demanding due to hyperparameters search, which can be a limiting factor for their use 
in run-time training. Further, it is difficult to interpret the result of a trained SVM. When interpretation 
of the results is an issue, probabilistic learning models found wide application: Naive Bayes and logistic 
regression are popular approaches IITTII . although based on strong probabilistic assumptions which can 
be relaxed by resorting to more general Bayesian Networks ifT^ . 


4.2 Location Preview 

The LocationPreview feature predicts the same output as the UserProfile feature using different input 
data, that are GPS trajectories corresponding to journeys performed by the BSS users. Trajectory data 
encodes a form of dynamical information of different nature with respect to the static vectorial data in 
UserProfile, requiring a radically different ML approach. A GPS trajectory is a form of sequential data, 
a type of structured information where the observation at a given point of the sequence is dependent 
on the context provided by the preceding or succeeding elements of the sequence. Such contextual 
information plays a role also in the learning task where, for instance, the decision on which will be the 
arrival station corresponding to a GPS trajectory cannot be taken based on the observation of a single 
element of the sequence, but should rather take into account the context provided by the full sequence or 
by a part of it. This requires learning models that can take into consideration such contextual information 
when computing their predictions, that are ML models for sequential/timeseries data. A straightforward 
approach to the problem is to use models for static data (such as those seen for the UserProfile feature) 
feeding them with a fixed-size chunk of the input sequence. This window of observations can be slid 
across the full length of the sequence, providing a prediction for each sequence element that can take 
into consideration the surrounding elements up to the window length. The key issue of such an approach 
is how to determine the correct size of the window for each learning problem. To address this issue, 
learning models have been proposed that are capable of maintaining a memory of the history of the 
input signals and to use it to compute their predictions. Recurrent Neural Networks (RNNs) ifTdl are 
ML models that have been proposed specifically fo deal with the dynamics of sequential information. 
They extended the original artificial neural networks paradigm with feedback connections that introduce 
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a dynamic memory of the neuron activation which can be used to encode short to long term dependencies 
among the elements of the sequence, depending on the specific network architecture. In this context, the 
use of Reservoir Computing (RC) ifTSll has gained increasing interest as a modeling method for RNN, due 
to its ability in conjugating computational efficiency with the RNN capability of dealing with learning 
in temporal sequence domains. The underlying idea of the RC approach is to use a layer of sparsely 
connected recurrent neuron whose connections are initialized and left untrained; adaptation of the neural 
weights is restricted to the layer of output neurons. This allows to considerably reduce the computational 
complexity of training, which is a key issue if this is performed at run-time. RC models appear well 
suited for the implementation of the LocationPreview feature: in particular, they have already shown 
considerable efficacy in closely relafed learning tasks, such as the prediction of the destination room of 
trajectories of users walking in indoor environments ijH. 

4.3 Discussion 

The work reported in ifTOl applies ML to BSS data but it focuses mainly on mining usage models of 
BSS with the aim of identifying template behaviors which can be used as demand profiles for system 
management. In contrast, we propose to use ML as a modeling tool to build system features that can 
be used at run-time by the user application. At the same time, as explained in the next section, we 
propose to use the learning model error functions as part of the pre-deployment analysis to assess the 
cost-performance trade-off of the features to be included in the final BSS deployment. Finally, we also 
take into account the dynamic nature of trajectory data by using appropriate ML models, such as the RC 
approach, instead of adapting static models to perform spatio-temporal data analysis lIT^ 

5 Performance evaluation of ML models 

A key aspect of ML models is the assessment of their predictive performance. Good ML practice en¬ 
visages a three-step process to build effective predictive learning models and reliably assess their perfor¬ 
mance. 

1. Training, which consists in adapting the parameters of the learning models using training data and 
numerical routines that optimize the model performance function (error). 

2. Model selection, which consists in estimating performance achieved by different learning models, 
including different hyper-parameter settings (i.e. model-tuning parameters set by the developer), 
in order to select the best model (with respect to the performance function) 

3. Final assessment, which consists in evaluating the performance of the selected model on new data, 
providing a measure of the generalization performance of the ultimately chosen model. 

Clearly, the latter step can be interpreted as a robust estimation of the performance of the feature 
implemented by the learning model when deployed in the run-time system. As such, it can be used as part 
of the product line to straightforwardly assess the efficiency-cost trade-off of the features implemented 
by ML models. Note that such an evaluation step can, in principle, exploit data logged by an existing 
system deployed by another client and different from the one being developed. Clearly, such estimated 
performance will provide an indication which will resemble the actual deployment only if the usage data 
available is coherent with the expected usage of the system under development. For instance, it has to 
be expected that trajectory data for cities with considerably different BSS scales and topologies will not 
provide an adequate ground for comparison. 
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The three steps above ean be implemented throughout a eross-validation seheme. The popular K- 
fold eross-validation would partition the available usage data into K equally sized subsets, using ^ — 1 
groups for the first step while using the hold-out subset to assess the model performanee (seeond step). 
This proeedure is repeated for eaeh of the K possible ehoiees of the held out group and the performanee 
is then averaged over sueh K ehoiees. In the simplest seheme, this latter performanee is used as final 
assessmenf of fhe model (fhird sfep). However, when key model seleefion ehoiees are required in fhe 
seeond sfep, fhese are faken on fhe .^f-fold averaged performanee, while fhe final assessmenf is eompufed 
on a eomplefely exfernal fesf sef of hold-ouf dafa never used in fhe .^f-fold proeess. The aefual form 
of fhe performanee measure depends on fhe nafure of fhe learning fask, buf if fypieally evaluafes fhe 
diserepaney befween fhe oufpuf prediefed by fhe learning models and fhe desired (ground-frufh) oufpuf. 
The Mean Absolute Error (MAE) is a popular ehoiee fo esfimafe fhe performanee in regression fask as 
fhe absolufe value of fhe differenee befween fhe model oufpuf and fhe expeefed fargef oufpuf, averaged 
over fhe number of samples under eonsiderafion. Eor elassifiealion fasks, performanee is often assessed 
as elass aeeuraey 

TPi + TNi 

acci - 

where A; is fhe number of samples in fhe /-fh elass, while TPi and TNi are fhe number of frue posifive 
and frue negative elassifieafions prediefed by fhe model for fhe /-fh elass. 

6 Conclusions 

We have diseussed how a maehine learning approaeh ean be used fo bofh implemenf and evaluate pre- 
diefive Produef Eine Eeafures. We addressed fhe ease sfudy of fhe European projeef QUANTICOE, 
eoneerning fhe quanfifafive analysis of bike-sharing sysfems (BSS). The fealures required by fhe ease 
sfudy are paradigmafie of fwo elasses of learning fasks whieh require learning models of differenl nafure 
and eapabilifies, i.e. sfafie models for veeforial dafa and dynamie models for sequenfial dafa. 

Sueh models are frained on hisforieal usage dafa fo realize a deployable implemenfafion of fhe fea- 
fure. In addifion fo fhaf, a frained eompufafional learning model is eharaeferized by a measure of predie- 
five performanee fhaf ean be used fo assess fhe eosl-performanee Irade-off of fhe fealure before puffing 
if info operafion. 

We are eurrenfly fraining and validafing a learning model for fhe UserProfile fealure using real-world 
usage dafa eomprising more fhan 280.000 enlries on fhe form 

{UserlD, leave station, leave date and time, return station, return date and time) 

eovering all hires in Pisa aeross fwo years. As eoneerns fhe LocationPreview fealure, sinee bikes 
in Pisa are nol equipped wilh GPS, we will look for dafa from differenl lowns, whieh may in any ease 
provide a measure of predielive performanee and lei fhe slakeholders assess whelher if is worlh buying 
GPS Iraekers. A differenl solulion is fo use dafa eoming from a simulation. 

A general question is indeed assoeialed wilh fhe eoneepl of eonlinuous learning, fhaf is deeiding 
when fo aelivale model fraining and how fo keep fhe fealure up fo date wilh respeel fo fhe availabilily of 
new usage dafa. Sueh ehoiees ean have an impael on fhe predielion aeeuraey as well as on fhe slabilily 
of fhe learning model. Eor Ihis speeifie aspeel, we will inilially rely on experl knowledge, buf we will 
also explore possible aulomafed deeision proeesses. 

To assess fhe eosl-performanee Irade-off of fhe fealures, we plan fo use Clafer, a general-purpose 
modeling language designed fo represenl domains, mela-models, eomponenls and variabilily models. 
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like Feature models. Clafer has already been applied for modeling and optimization of product lines 
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